Arabic Entity Graph Extraction Using Morphology, Finite State Machines, and Graph Transformations
نویسندگان
چکیده
Research on automatic recognition of named entities from Arabic text uses techniques that work well for the Latin based languages such as local grammars, statistical learning models, pattern matching, and rule-based techniques. These techniques boost their results by using application specific corpora, parallel language corpora, and morphological stemming analysis. We propose a method for extracting entities, events, and relations amongst them from Arabic text using a hierarchy of finite state machines driven by morphological features such as part of speech and gloss tags, and graph transformation algorithms. We evaluated our method on two natural language processing applications. We automated the extraction of narrators and narrator relations from several corpora of Islamic narration books. We automated the extraction of genealogical family trees from Biblical texts. In all applications, our method reports high precision and recall and learns lemmas about phrases that improve results.
منابع مشابه
Arabic Cross-Document NLP for the Hadith and Biography Literature
Recently cross-document integration and reconciliation of extracted information became of interest to researchers in Arabic natural language processing. Given a set of documents A, we use Arabic morphological analysis, finite state machines, and graph transformations to extract named entities Na and relations Ra expressed as edges in a graph G = 〈Na, Ra〉. We use the same techniques to extract e...
متن کاملMERF: Morphology-based Entity and Relational Entity Extraction Framework for Arabic
Rule-based techniques and tools to extract entities and relational entities from documents allow users to specify desired entities using natural language questions, finite state automata, regular expressions, structured query language statements, or proprietary scripts. These techniques and tools require expertise in linguistics and programming and lack support of Arabic morphological analysis ...
متن کاملتشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کاملAn Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملFrom UML 2 Sequence Diagrams to State Machines by Graph Transformation
Algebraic graph transformation has been promoted by several authors as a means to specify model transformations. This paper explores how we can specify graph transformation-based rules for a classical problem of transforming from sequence diagrams to state machines. The specification of the transformation rules is based on the concrete syntax of sequence diagrams and state machines. We introduc...
متن کامل